Understanding State Preferences With Text As Data: Introducing the UN General Debate Corpus

نویسندگان

  • Alexander Baturo
  • Niheer Dasandi
  • Slava J. Mikhaylov
چکیده

Every year at the United Nations, member states deliver statements during the General Debate discussing major issues in world politics. These speeches provide invaluable information on governments’ perspectives and preferences on a wide range of issues, but have largely been overlooked in the study of international politics. This paper introduces a new dataset consisting of over 7,300 country statements from 1970–2014. We demonstrate how the UN General Debate Corpus (UNGDC) can be used to derive country positions on different policy dimensions using text analytic methods. The paper provides applications of these estimates, demonstrating the contribution the UNGDC can make to the study of international politics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

پیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی

Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...

متن کامل

Topology Analysis of International Networks Based on Debates in the United Nations

In complex, high dimensional and unstructured data it is often difficult to extract meaningful patterns. This is especially the case when dealing with textual data. Recent studies in machine learning, information theory and network science have developed several novel instruments to extract the semantics of unstructured data, and harness it to build a network of relations. Such approaches serve...

متن کامل

SMT at the International Maritime Organization: experiences with combining in-house corpora with out-of-domain corpora

This paper presents a machine translation tool – based on Moses – developed for the International Maritime Organization (IMO) for the automatic translation of documents from Spanish, French, Russian and Arabic to/from English. The main challenge lies in the insufficient size of inhouse corpora (especially for Russian and Arabic). The United Nations (UN) granted IMO the right to use UN resources...

متن کامل

A Mutually Beneficial Integration of Data Mining and Information Extraction

Text mining concerns applying data mining techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents, transforming unstructured text into a structured database. This paper describes a system called DISCOTEX, that combines IE and data mining methodologies to perform text mining as well as...

متن کامل

The Title of a Literary Text as a Discursive Phenomenon

Modern text linguistics pays serious attention to the significant structural elements of the text, which carry special knowledge. Such structural elements include the title. In this article, the title is considered as a linguistic and cognitive characteristic and a spatially fixed structural element of the text – «frame», which is located around/before/behind the text, focusing on the importanc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1707.02774  شماره 

صفحات  -

تاریخ انتشار 2017